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AbstraA-  The  inductive  learning  algorithms  are  the  very 
attractive  methods  generating  hierarchical  classifiers.  They 
generate  the  hypothesis  of  the  target  concept  on  the  base  of  the 
set  of  labeled  examples.  This  paper  presents  some  of  the  rule 
generation  methods,  their  usefulness  for  the  rule-base  classifier 
and  their  quality  of  classification  for  the  medical  decision 
problem 
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I.  Introduction 

Machine  learning  [1]  is  the  attractive  approach  for 
building  decision  support  systems.  For  this  type  of  software, 
the  key-role  plays  the  quality  of  the  knowledge  base.  In  many 
cases  we  can  find  following  problem: 

•  the  experts  can  not  formulate  the  rules  for  decision  problem, 
because  they  might  not  have  he  knowledge  needed  to 
develop  effective  algorithms  (e.g.  human  face  recognition 
from  images), 

•  we  want  discover  the  rules  in  the  large  databases  (data 
mining)  e.g.  to  analyze  outcomes  of  medical  treatments 
from  patient  databases;  this  situation  is  typical  for  designing 
telemedical  decision  support  system,  which  knowledge  base 
is  generated  on  the  base  on  the  large  number  hospital 
databases, 

•  program  has  to  dynamically  adapt  to  changing  condition. 

Those  situations  are  typical  for  the  medical  knowledge. 
For  many  cases  the  physician  can  not  formulate  the  rules, 
whose  are  used  to  make  decision  or  the  formulated  rule  is 
incomplete. 

In  the  paper  we  compare  the  heuristic  classifier  (given  by 
experts)  and  three  another  generated  by  the  chosen  inductive 
learning  methods. 

The  content  of  the  work  is  as  follows.  Section  2 
introduces  idea  of  the  inductive  decision  tree  algorithms  and 
learning  sets  of  rules  method.  In  Section  3  we  describe 
mathematical  model  of  the  acute  abdominal  pain  decision 
problem.  Next  section  presents  results  of  the  experimental 
investigations  of  the  algorithms.  Section  4  concludes  the 
paper. 

n.  Algorithms 

We  chose  three  of  the  inductive  learning  algorithm. 

•  C4.5  algorithm  given  by  R.  J.  Quinlan  [3,4], 

•  Fuzzy  Decision  Tree  Algorithm  FID  3.0  given  by  C. 
Janikow  [3], 

•  Rule  generation  algorithm  -  AQ  given  by  R.  Michalski  [5]. 


Inductive  decision  tree 

Algorithms  C4.5  and  FID  are  the  modifications  of  ID3 
method  generating  decision  tree.  Therefore  let  us  present  the 
main  idea  of  the  ID3  below 

Create  a  Root  node  for  tree 

IF  all  examples  are  positive 

THEN  return  the  single  node  tree 

Root  with  label  yes  and  return. 

IF  all  examples  are  negative 

THEN  return  the  single  node  tree 

Root  with  label  no  and  return. 

IF  set  of  attributes  is  empty 

THEN  return  the  single  node  tree 

Root  with  label  =  most  common  value 
of  label  in  the  set  of  examples  and 
return 

Choose  "the  best"  attribute  A  from  the 
set  of  attributes. 

FOR  EACH  possible  value  vi  of  attribute 

1 .  Add  new  tree  branch  bellow  Root, 
corresponding  to  the  test  A=vi . 

2  .  Let  Evi  be  the  subset  of  set  of 
examples  that  has  value  vi  for 
A. 

3 .  IF  Evi  is  empty 

THEN  bellow  this  new  branches 
add  a  leaf  node  with  label  = 
most  common  value  of  label  in 
the  set  of  examples 
ELSE  below  this  new  branch 
add  new  subtree  and  do  this 
function  recursive. 

END 

RETURN  Root 

The  central  choice  in  the  ID3  algorithm  is  selecting  “the  best” 
attribute  (which  attribute  to  test  at  each  node  in  the  tree).  The 
proposed  algorithm  uses  the  information  gain  that  measures 
how  well  the  given  attribute  separates  the  training  examples 
according  to  the  target  classification.  This  measure  based  on 
the  Shanon’s  entropy  of  set  S: 
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En trop^S )  =  £  -  pi  log  2  p, ,  ,  ( 1 ) 

i=l 

where  pt  is  the  proportin  of  S  belonging  to  klas  i 
(is  M,M  =  { 1, 2, 

The  information  gain  of  an  attribute  A  relative  to  the 
collection  of  examples  S,  is  defined  as 

I5  I 

Gain(S ,  A)  =  Entropy(S )-  ^  '-y-r-Entropy{S v ) ,  (2) 

cevalue^A)  p  | 

where  valuesyA )  is  the  set  of  all  possible  values  for  attribute 
A  and  Sv  is  the  subset  of  S  for  which  A  =  v  . 

The  C4.5  algorithm  modifies  ID3  that  at  the  beginning  the 
tree  generation  procedure  does  not  use  the  whole  set  of 
examples.  The  FID  algorithm  assumes  that  attribute  values 
are  the  fuzzy  observations. 

Learning  set  of  rules 

The  algorithms  like  CN2  [1]  or  AQ  [5]  based  on  the  learning 
one  rule  (LOR)  strategy,  removing  data  it  covers,  then 
iterating  the  process.  This  sequential  covering  procedure  is 
presented  bellow. 

Sequential_covering (examples) 

R:  =  0. 

P : =  examples . 

DO  WHILE  P<>  0 

r:=  learn -one-rule  (examples,  P)  . 

R : =Rur  . 

remove  from  P  all  examples 
covered  by  r 

END  . 

RETURN  R. 

The  LOR  method  is  similar  to  the  ID3  algorithm  presented 
above.  The  LOR  algorithms  follow  only  the  most  promising 
branch  in  the  tree  at  Ihe  each  step  -  returns  only  one  rule, 
which  covers  at  least  some  of  the  examples. 

We  have  presented  only  idea  of  algorithm.  Of  course  the 
method,  we  talk  over,  are  more  complicated.  For  example  we 
do  not  present  pruning  methods  whose  protect  us  against 
situation,  where  the  tree  overfits  the  training  set  [1,  3,  8], 

III.  MODEL  OF  ACUTE  ABDOMINAL  PAIN  DIAGNOSIS 
The  mathematical  model  of  the  diagnosis  of  acute  abdominal 
pain  (AAP)  was  simplified.  Hover  the  experts  from  the  Clinic 
of  Surgery,  Wroclaw  Medical  Academy,  regarded  that  stated 
problem  of  diagnosis  as  very  useful. 

It  leads  to  the  following  classification  of  the  AAP: 

1.  appendicitis, 

2  divercitulitis, 

3.  small-bowel  obstruction, 

4.  perforated  peptic  ulcer. 


5.  cholecystitis, 

6.  pancreatitis, 

7.  non-specyfic  abdominal  pain, 

8.  rare  disorders  of  “acute  abdominal”. 

Although  the  set  of  symptoms  necessary  to  correctly  assess 
the  existing  APP  is  pretty  wide,  in  practice  for  the  diagnosis, 
results  of  36  (non-continuos)  examinations  are  used,  whose 
are  presented  in  table  I. 

Table 1 


CLINICAL  FEATURES  CONSIDERED 


no 

feature 

no 

feature 

no 

feature 

1 

sex 

13 

appetite 

25 

systolic  blood 
pressure 

2 

age 

14 

bowels 

26 

diastolic  blood 
pressure 

3 

site  on  onset 

15 

micturition 

27 

movement  (of 
abdomen) 

4 

site  on  present 

16 

previous 

indigestion 

28 

distension 

5 

intensity 

17 

jaundice 

29 

tenderness  (site) 

6 

aggravating 

factors 

18 

previous  similar 
pain 

30 

Blumberg’s  sign 

7 

relieving  factors 

19 

previous  surgery 
(abdominal) 

31 

guarding 

8 

progress 

20 

drugs 

32 

rigidity 

9 

duration 

21 

mood 

33 

swellings 

10 

character  on  onset 

22 

color 

34 

Murphy’s  sign 

11 

character  on 
present 

23 

temperature 

35 

abdominal 

auscultation 
(bowel  sounds) 

12 

nausea  and 
vomiting 

24 

pulse 

36 

rectal 

examinations 

Heuristic  decision  tree 

The  experts -physicians  gave  the  decision  tree  [4]  depicted  on 
Fig.l.  Numbers  of  leafs  are  the  numbers  of  diagnosis 
presented  above.  The  numbers  in  the  nodes  are  corresponded 
with  the  following  diagnosis: 

9.  acute  enteropathy, 

10.  acute  disorders  of  the  digestive  system, 

1 1.  others. 

IV.  Experimental  investigation 
The  presented  algorithms  C4.5,  FID  and  AQ  were  used 
for  creating  rules  for  APP  decision  problem.  Their 
frequencies  of  correct  classification  were  compared  with 
quality  of  heuristic  classifier  [9.  10]. 

The  set  of  data  has  been  gathered  in  the  Surgery  Clinic.  It 
contains  476  learning  examples. 

For  each  learning  method  the  following  experiment  was 
made: 

•  from  the  learning  set  40  examples  was  chosen  (according 
with  frequency  of  the  class  appearance);  this  set  was  use  for 
test, 

•  the  rest  of  examples  (436)  were  training  ones. 

This  procedure  was  repeated  20  times  for  each  of  the 
algorithms.  The  results  of  the  experiments  are  presented  in 
Table  II  and  depicted  on  Fig. 2. 


Fig.  1 .  Heuristic  classifier  for  the  APP  diagnosis  problem 


Table  ii 

FREQUENCY  OF  CORRECT  CLASSIFICATION 


Class 

number 

Heuristic 

AQ 

C4.5 

FID 

1 

79.1 

90,5 

95,8 

86,7 

2 

88,2 

55,0 

92,3 

100 

3 

93.1 

95,0 

95,8 

100 

4 

67.1 

90,0 

95,6 

66,7 

5 

82.5 

98,8 

86,9 

83,3 

6 

84,4 

85,0 

96,2 

75,0 

7 

84.7 

97,5 

91.5 

75,0 

8 

88,2 

75,0 

92,3 

50,0 

average 

83.0 

85.8 

93.3 

80.4 

The  results  of  test  are  clear.  The  classifier  given  by  C4.5 
algorithm  is  always  better  than  heuristic  one.  The  AQ  and 
FID  algorithm  gives  the  better  results  for  some  of  class,  but 
for  another  the  frequency  of  correct  classification  is  very  low. 
Experts  revised  the  structures  of  classifiers  given  by  inductive 
learning  algorithms.  They  confirmed  that  all  of  rules  were 
correct  and  maybe  the  heuristic  classifier  was  incomplete. 

v.  Conclusion 

The  methods  of  inductive  learning  were  presented.  The 
classifiers  generated  by  those  algorithms  were  applied  to  the 
medical  decision  problem  (recognition  of  Acute  Abdominal 
Pain).  The  results  of  test  were  compared  with  recognition 
quality  of  heuristic  algorithm. 


It  must  be  emphasised  that  we  have  not  proposed  a  method  of 
"computer  diagnosis".  What  we  have  proposed  are  the 
algorithms  whose  can  be  used  to  help  the  clinician  to  make 
his  own  diagnosis.  The  superiority  of  the  presented  empirical 
results  for  the  inductive  learning  classifiers  over  heuristic  one 
demonstrates  the  effectiveness  of  the  proposed  concept  in 
such  computer-aided  medical  diagnosis  problems. 
Advantages  of  the  proposed  methods  make  it  attractive  for  a 
wide  range  of  applications  in  medicine,  which  might 
significantly  improve  the  quality  of  the  care  that  the  clinician 
can  give  to  his  patient. 
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